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Abstract. We consider the convex minimization model with both linear equality and inequality 
constraints, and reshape the classic augmented Lagrangian method (ALM) by balancing its sub- 
problems. As a result, one of its subproblems decouples the objective function and the coefficient 
matrix without any extra condition, and the other subproblem becomes a positive definite system of 
linear equations or a positive definite linear complementary problem. The balanced ALM advances 
the classic ALM by enlarging its applicable range, balancing its subproblems, and improving its 
implementation. We also extend our discussion to two-block and multiple-block separable convex 
programming models, and accordingly design various splitting versions of the balanced ALM for 
these separable models. Convergence analysis for the balanced ALM and its splitting versions is 
conducted in the context of variational inequalities through the lens of the classic proximal point 
algorithm. 
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1 Introduction 


The classic augmented Lagrangian method (ALM) was proposed in (25130), and since then it has 
been playing fundamental roles in algorithmic design for various convex programming problems. 
For instance, it is the root of the alternating direction method of multipliers (ADMM) proposed 
in |13|, which is nowadays a benchmark algorithm used widely in many areas. We refer to, 
e.g. 3612414482], for insightful discussions on the ALM and its wide applications in different 
areas such as PDEs, optimization, optimal control, image processing, and scientific computing. 
In particular, it was shown in that the ALM is an application of the classic proximal point 
algorithm (PPA) which was originally proposed in 27. 

Let us start with the following canonical convex minimization model with linear equality 
constraints: 


min{0(x) | Az =b, x € X}, (1.1) 


where 8 : R” — R is a closed proper convex but not necessarily smooth function; ¥ C R” is a 
closed convex set; A € R™*”; and b € R”. The iterative scheme of ALM for (1.1) reads as 


(ALM) rët € arg min{6(x) — (AF)T (Ax — b) + gllAe — b||? | cE x}, (1.2a) 





AFHI = AF — r( Artt — b), (1.2b) 


in which r > 0 is the penalty parameter and ÀA € R” is the Lagrange multiplier. Hereafter, x and 
À are referred to the primal and dual variables, respectively. In general, the subproblem 
needs to be solved iteratively and thus outer-inner nested iterations are rendered to implement 
the ALM (1-2). Therefore, how to solve the x-subproblem determines the difficulty of 
implementing the ALM (1-2). An obvious obstacle is that the objective function 6(x), the 
coefficient matrix A, and the set ¥ are all aggregated to be considered simultaneously in the 
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sha ec Thus, the x-subproblem dominates the computation while the à- 
subproblem (1.2b) is trivial. In this sense, these two subproblems in the classic ALM are 
unbalanced. 

In this paper, we suggest to decouple the objective function @(a) and the coefficient matrix 
A in the subproblem so as to alleviate this subproblem substantially, and then shift the 
consideration of the matrix A to the subproblem (1.2b). The classic ALM (1.2) is thus reshaped, 
and the resulting subproblems are balanced in the sense that the difficulty of the z-subproblem 
only depends on (x) and 4, and that of the A-subproblem becomes to depend on A. This 
balancing idea has an immediate advantage when the function 0(x) has the favorable property 
that its proximity operator can be represented by a closed-form. That is, the proximity operator 
of the objective function 6(a), which is defined by 


Proxg(x) := arg min{6(y) + silly — z||? | y E R” }, Yz € R”, Yr >Q, (1.3) 


has a closed-form representation. This scenario arises in many applications, especially in con- 
temporary data science domains. We refer to, e.g., [4\/3}/32), for some applications whose cor- 
responding function (x) usually prompts sparsity- or low-rank properties of a desired solution 
and hence can be specified as the /;-norm function (or the nuclear-norm function for the case 
with matrix variables). Our idea of decoupling @(a) and A can be further explained by the 
following motivation. Ignoring some constant terms, we know that the subproblem can 


be rewritten as 
ght 





€ arg min{0(z) |Ax — b IP [oe V}. 


LT 
T 2 | 
For the classic ALM (1.2), even when Proxy(x) can be represented by a closed-form and ¥ = R”, 
in general the subproblem may still be difficult when the matrix A is not identity. If 6(x) 
and A are decoupled and the primeval x-subproblem is replaced by an easier one in form 
of 

at! = arg min{6(«) + sllz =g mex, (1.4) 


in which q* € R” is a certain constant vector, then the solution of (1.4) can also be given by the 
closed-form representation of Proxg(x) when ¥ = R”. 

Some existing algorithms in the literature can be applied to (1.1), and @(2) and A can 
be decoupled in their implementations. For example, as analyzed in [20], we can consider 
regularizing the objective function of with a proximal term. The resulting proximal 
version of the ALM (PALM for short) can be written as 





1 
kt! € argmin{0(x) — (A¥)T (Ax — b) + Č] Ax — b||? + =I\x — z*||2 re}, (1.5a) 
EN gmin{O(e) — T (Aw — b) + SI. Aw = B? + Sle — 2h | e € 2} 


AEHL = AF — r( Azt H — b), (1.5b) 


with G € R"*” and the notation ||x||} := «7G. If we choose G = oI, — rATA in (1.5a) with 
o > 0, then the generic PALM (1.5) is specified as 


xk arg min; 6(x Zijpe ak 1 Ti yk pf Aah — 2 a 6a 
(LALM) +I € argmin{0(«) + zle — (ek + —AT(AF — r(Aak —b)))|| |se}, (16a) 


AFHI = DF — r( Axft1 — b), (1.6b) 


in which the subproblem is in form of and thus it is reduced to Proxg(x) when 
X = R”. The scheme (1.6) is called the linearized ALM (LALM for short) because the quadratic 
term 5||Aa—6||? in T “linearized” by G = oI, —rATA. From an analytical point of view, 
it is easy to see that if ø is large enough such that ø > r||A7 All, then the matrix G = o In —r AT A 


is positive definite, and essentially convergence of the LALM can be conducted by following 
existing works such as (11}{79}[34}[35). This means the classic ALM can be revised as the 
LALM (1.6) to decouple (a) and A. But the subproblem is correlated implicitly with 
A because of the condition ¢ > r||A7 Al]. On the other hand, note that the approximation of 
to the primeval x-subproblem is less accurate when ø is larger, because of the higher 
weight of the additional quadratic term $||a — 2*||2, with G = oJ, — rA’ A. When ||A7 A|j is 
large, o is forced to be large, and the consequence is that the step size for solving becomes 
small and it is doomed that more outer iterations are needed, despite that the inner iterations 
can be avoided. In [20], it is shown that the best bound of ø is 0.75 - r|| AT A|| to ensure the 
convergence of (1.6), while the mentioned difficult remains if || AT AJ] is too large. 

There is another algorithm that can be applied to the problem (1.1), and 0(x) and A can 
be decoupled in its implementation. More specifically, let us consider the Lagrangian function 
of (1.1) and its saddle-point reformulation, and then apply the primal-dual method proposed 
in |5|. With some tedious details skipped, the resulting iterative scheme can be written as 





1 
zët = arg min{0(x) + sllz — (#* + = AT)? | z € 8}. LEa] 
tHe ~(A(an*+4 — zk) — b), (1.7b) 


where r > 0 and s > 0 are parameters for the primal- and dual-variable subproblems, respec- 
tively. In (L7a), 0(x) and A are also decoupled, and this subproblem is also reduced to Prox (x) 
when ¥ = R”. Nearly at the same time as [5], the primal-dual method proposed in was 
explained as an application of the classic PPA in (22, and then this PPA explanation has been 
used to analyze the convergence for variants of the primal-dual method (1.7), as well as other 
first-order algorithms, in the literature, see, e.g. (2\(71[29]. To ensure the convergence of (1-7), as 
analyzed in [5], the condition 

rs > || ATAI] (1.8) 


is required. Following the PPA explanation in 22], the condition is used to ensure the 
positive definiteness of the matrix that is used to definite the underlying PPA. We refer to, 
e.g. [511622124] for some efficient applications of the primal-dual method to some image 
reconstruction problems whose corresponding || AT A|| is small. Therefore, despite that the ob- 
jective function 0(x) and the coefficient matrix A are decoupled in notation, the subproblem 
is correlated implicitly with A because of the condition (1.8). Clearly, the same difficulties 
as those for implementing the PALM should be tackled if || A? Al] is too large. 

Our main purpose is to balance the subproblems of the classic ALM such that both 
subproblems could be easy for some applications. More specifically, let r > 0 and 6 > 0 be 
arbitrary constants; and define the positive definite matrix Hp € R#’"*” as 


1 
Ho := (AAF + 5Im). (1.9) 
r 
Then, with gk := z" + LAT), the classic ALM (1.2) for the problem (1.1) is balanced as 


at} = arg min{6(x) + Sle — gl? | ce x}, (1.10a) 


(Balanced ALM) 
AKH = yh Hg (A(Qaek+1 — ah) — b). (1.10b) 





Hence, for the model (1.1) with ¥ = R”, the balanced ALM (1.10) is reduced to 


rkt! = Prox’ (qk), (1.11a) 
AH =F — HoT A(Qak+1 — ph) — b). (1.11b) 


In the x-subproblem (1-10a), it is easy to discern that (x) and A are decoupled while the 
parameter r is not restricted by any condition related to || AT A|| explicitly or implicitly, and 
thus it could be as easy as estimating Proxg when ¥ = R”. Moreover, the A-subproblem (1.10b) 
is still very easy though it involves the matrix A and thus becomes slightly more difficult than 
or (1.7b), see Remark for more details. In this sense, the subproblems of the classic 
ALM (1.2) are balanced in Fa. Obviously, the balanced ALM (1.10) enjoys the proximity- 
induced feature while it can avoid possible tiny step sizes for the subproblems even when 
|| A7 Al| is large. This is an essential difference of the balanced ALM from the PALM 
and the primal-dual method (1.7). We consider the balanced ALM a necessary 
supplement to the classic ALM (1-2), especially for the case where Prox4(x) has a closed-form 
representation but || ATA] is large. 

The rest of this paper is organized as follows. We state the model to be considered and 
generalize the balanced ALM for this model in Section [2} Then, we conduct convergence 
analysis for the balanced ALM in Section [3] In Section [4] we extend our discussion to separable 
convex programming models and propose a splitting version of the balanced ALM. An alternative 
strategy for balancing is discussed in Section[5] In Section|6| from the PPA perspective, we briefly 
discuss how to further generalize the algorithms to be proposed in Sections Finally, some 
conclusions are made in Section [7] 


2 Model and algorithm 


Note that the classic ALM was proposed in the context of the canonical convex program- 
ming model with linear equality constraints (1.1). Despite that our initial aim is to consider 
the balanced ALM for the model (1.1), the balanced ALM does can be general- 
ized to the more general convex programming model with both linear equality and inequality 
constraints. We thus present our work in a more general setting as below. 


2.1 Model 


Instead of (1.1), let us consider the following more general convex programming model with 
both linear equality and inequality constraints: 


min{6(x) | Ar =b (or > b), rE X}, (2.1) 


in which the setting is same as except that the linear inequality Ax > b is also included. 
The solution set of (2.1) is assumed to be nonempty throughout our discussion. With the 
consideration of the more general model (2.1), the applicable range of the algorithm to be 
proposed is thus wider than that of the classic ALM (1.2). 


2.2 Algorithm 


Now, let us generalize the balanced ALM (1.10) to the model (2.1), and the name remains for 
simplicity. Recall that our main purpose is to balance the subproblems in the classic ALM (1.2). 





Algorithm: the balanced ALM for (2.1) 
Let r > 0 and 6 > 0 be arbitrary constants; Ho be defined in (1.9). Denote 


1 
gk := a + WAP and sk := A(2a*t1 — gë) — b. 
Then, with (x*, \*), the new iterate («**!, \**+1) is generated via the following steps: 
ah! = arg min{6(«) + Sle — gil? | 2 € x}, (2.2a) 


1 
AFTI = arg min{ 5 (A — A") Ho(A = A") + (80) A| A € A}. 22P) 











a 2.1. It is easy to see that the balanced ALM (2.4) is reduced to the aforementioned 
Q) if the model (1.1) is considered. In particular, we have A = ® and thus the subproblem 
m a is reduced to finding \*+! such that 


H(A — A*) = sÈ. 


When Az > b is considered in (2.1), we have A = RY. For this case, the subproblem (2.2b) is 
reduced to the standard quadratic programming with non-negative sign constraints 


1 
min{ 5(A — M)Ho(A— A*) + (86) 7A | AE RTF, 
or equivalently, the linear complementarity problem 
O<A L {Ho(A— A") + 56} > 0. 


Recall that the matrix Ho defined in is positive definite and it can be well conditioned with 
appropriate choices ofr and 6. Hence, it is extremely easy to decompose Ho, e.g., by the Cholesky 
decomposition. Then, many benchmark solvers including the well-known Lemke algorithm and 
conjugate gradient method, can be found in various textbooks (e.g., [73,[2A[35)), monographs 


(€.9., [9)), and papers (e.g., Ajaj). 


Remark 2.2. Recall that the balanced ALM is featured by the fact that the function 0(x) 
and the coefficient matrix A are decoupled without any explicit or implicit condition related to 
A in (2.2a). Compared with the PALM and the primal-dual method (7.7), the balanced 
ALM (2.2) also has two parameters, ô and r, whose only restriction is their sign. As we will 
show in Section [3 the only essential role of 6 is to theoretically ensure the positive definiteness 
of the corresponding matriz H as defined in (3.7. Therefore, there is no particular motivation 
to tune ô for different applications, and it can be just fixed as a small value beforehand. For the 
parameter r, just as the same parameter in the classic ALM (7.4), there is full-extent flexibility 
to tune this parameter. Certainly, how to tune r depends on the specific model and dataset under 
discussion, whilst there is no generic and unified theory to determine the optimal choice for all 
cases. 


3 Convergence analysis 


In this section, we prove the convergence of the balanced ALM (2.2), and estimate its worst-case 
convergence rate measured by the iteration complexity. 


3.1 Variational inequality characterization of (2.1) 


Following our previous works (22123), our analysis will be conducted in the variational inequality 
(VI) context. We first derive the VI characterization for the optimality condition of the model 
(2.1). Let the Lagrangian function of the problem (2.1) be defined as 


L(x, ) = 6(x) — AT (Ax — b), (3.1) 


with A € R” the Lagrange multiplier. Since both linear equality and inequality constraints are 
considered in (2.1), let us define 


R”, if Ax = b, 


Q:=ÆXxA where Ba if Ax > b. 


(3.2) 


The pair (x*, \*) € Q is called a saddle point of the Lagrangian function (3.1) if it satisfies the 
inequalities 
Lyeala*, A) < L(a*, A“) < Lrex (a, à*). 


Alternatively, we can write these inequalities as the following VIs: 


r* EX, O(x)—O(a*) + (z — r*)T (ATA) > 0, Vere dX, 
{ AE A, (\—d*)F(Az* —b) > 0, VAEA, 3) 
or in the compact format 
we EQ, O(x) — 6(2*) + (w—w*)  F(w*) >0, Vuegd, (3.4a) 
where E _ATy 
vaf r= a) and {=A x A. (3.4b) 


Note that the operator F defined in (3.4b) is affine with a skew-symmetric matrix and thus we 
have 
(w — w)" (F(w) — F(w)) = 0. (3.5) 


We also call (3.4) a monotone mixed variational inequality because the function @ is convex and 
the operator F has the property (3.5). We denote by Q* the solution set of the VI (3.4); it is 
also the set of the saddle points of the Lagrangian function (3.1). 


3.2 Contraction 


We need to show that the sequence generated by the balanced ALM (2.2) is contractive with 
respect to *, the solution set of the VI (3.4). This is the key property to ensure its convergence. 
Before that, let us recall a basic lemma whose proof is elementary and can be found in, e.g., (1). 


Lemma 3.1. Let X C R” be a closed convex set, O(x) and f(x) be convex functions. If f is 
differentiable, and the solution set of the minimization problem 


min{6(x) + f(x) |2 € X} 
is nonempty, then it holds that 
x € arg min{0(x)+ f(x) |x € £} (3.6a) 


if and only if 
a EX, O(c) —O(a*)+(e—2*) Vf (2*)>0, Vee X. (3.6b) 


To show the contraction property of the sequence generated by the balanced ALM (2.2), the 
first step is to fathom the difference of an iterate generate by the balanced ALM (2.2) from a 
solution point w* € Q*. Recall the definition of Ho in (1.9). Let us define 


rI AT T 
H= Tp a e (3.7) 
A -AA +6Im A Ho 
r 
Proposition 3.1. The matrix H defined in is positive definite. 


Proof. Notice that 


rly, AT yTI, 
_ m 0 0 _ IAT 0 0 
m rar +o = fA (Vr f2A") + 6 a 
r r 
for any w = (x, A) #0. Thus, we have 


wT Hw =||Vre+ ŁATA]? + dial? > 0, 











and therefore the matrix H is positive definite. 
In the following theorem, we will express the difference of an iterate generated by the balanced 
ALM (2.2) from a solution point w* € Q* in the context of VIs. 





Theorem 3.1. Let {w* = (x*,\*)} be the sequence generated by the balanced ALM (2.2) and 
H be defined in (3.7). Then we have 


wt EQ, 6(2)—6(0**1) 4+ (w—w*1)? (wT) > (w—w 1) A (ww), Ywen. (3.8) 


Proof. According to Lemma]3.1| the solution z¥+! of the subproblem (2.2a)) can be characterized 
by the VI 


xttl e X, (x) — O(t!) + (2 — zët TS— ATAY + r(chtt — ax*)} >0, Vre. 
Then, for any unknown \**+!, we have 


gktl EX, 0(zx) = 6(a**1) F (x _ ptt GAT 
> (x— oh )F {r(ah — gt!) + AT (AF = AHY, Vue Xx. (3.9) 


Similarly, because of Lemma|3.1| the solution \*+! of the subproblem (2. 2b) can be characterized 
by the VI 


MEA, (AR ey (Alaa? aah) = b) + (AH — ry} >0, VAEA. 
Recall the definition of Hp in (1-9). We thus have 
etl E€ A, (A _ MY (Ag = b) 
1 
> (A — MFT L(A(a® — abt) + (<Aa™ m 5Im) (MF — AFH) VA © A. (3.10) 











Combining (3.9) and (3.10), and using the notation in (3.4), we obtain the assertion (3.8). 


In the following theorem, we will prove an important inequality which measures the difference 
of an iterate generated by the balanced ALM from a solution point w* € Q* more explicitly 
by H-norm-induced distances. This inequality is also the basis of estimating the convergence 
rate measured by the iteration complexity for the balanced ALM (2.2). 





T 


Theorem 3.2. Let {w" = (x*,\*)} be the sequence generated by the balanced ALM (2.2) and 
H be defined in (3.7. Then we have 


A(x) = O(a***) + (w— wt F(w) 


1 
> 5 (leo — wh li — lw- wla) + slo’ -wti Vw € 2. (3.11) 


z 
2 
Proof. It follows from that 
(w — wth? P(t) = (w — wt!) F(w), 
and thus the left-hand side of (3.8) equals 
Olx) — O(a**1) + (w — wet!) F (w). 

Consequently, because of (3.8), we get 

wt ENQ, 6(x) — 0(#*) + (w wttHTF(w) > (w — wttHTH(w" — w"t!), YwegQ. (3.12) 
Applying the identity 
oT (ba) = 5 {lll — lali + $a — Ol 


to the right-hand side of (3.12) with a = w — w! and b = w — w**!, we thus obtain 


1 1 
(w -= wh) (wh — wh) = (ll — wip w- wl) + glwt- whl (8-13) 














Substituting (3.13) into the right-hand side of (3.12), we prove the assertion (3.11). 
Now, with Theorems [8.1] and [3.2] the contraction property of the sequence generated by the 
balanced ALM (2.2) with respect to Q* can be proved. 


Theorem 3.3. Let {w* = (x*,\*)} be the sequence generated by the balanced ALM (2.2) and 
H be defined in (3.7. Then we have 


wt — wl < w" — w* | — lw — wt, Vw © o. (3.14) 
Proof. Setting w in (3.11) as any fixed w* € 0*, we get 
lw" — w* [Fe — wt — wll — lw — wti 


> 2{6(a**?) — O(a*) + (wth — w*) F(w*)}, Vw* € Q*. 


Since w* € Q* and wřt! € Q, according to (3.4), the right-hand side of the last inequality is 
non-negative. Thus, the assertion of this theorem follows directly. 














3.3 Convergence 


With the contraction property established in Theorem it is easy to prove the convergence 
of the sequence {w*} generated by the balanced ALM (2.2). 


Theorem 3.4. Let {w* = (x*,\*)} be the sequence generated by the balanced ALM (2.2) and 
H be defined in (3.7. Then, the sequence {w*} converges to some w® € Q*. 


Proof. First of all, it follows from (3.14) that the sequence {w*} is bounded and 
lim |w" — w+], = 0. (3.15) 
k= 


Let w® be a cluster point of {wf} and {w*i} be a subsequence converging to w%. It follows 


from (3.8) that 
wi EQ, O(x)— 8c) + (w— wT F (w) > (w — w) TH (wT! — w), Vo eo. 


Since the matrix H is positive definite, it follows from (3.15) and the continuity of 0(x) and 
F(w) that 
we EQ, O(x) — A(x£®) + (w —w®)TF(w®) >0, Vwe a. 


This VI above indicates that w° is a solution point of (3.4). Finally, because of (3.14), we have 


lJeutt? — wll < |]w* — w*|lF, 











and thus {w'} converges to w%. The proof is complete. 





3.4 Convergence rate 


Following the VI-based technique established in our earlier work [23], we can estimate the worst- 
case O(1/t) convergence rate measured by the iteration complexity for the balanced ALM 
where t is the iteration counter. 

Let us recall some necessary details which can also be found in 23]. If w is a solution point 
of the VI (3.4), then we have 


DEN, O(x)—0(@)+(w—w)TF(w)>0, Vwen. 
Because of (3.5), w also satisfies 
DEN, O(x)—O(%)+(w—w)TF(w)>0, Vwen. 
Thus, for given e€ > 0, w € Q is called an e-approximate solution of VI if it satisfies 
EN, (x) — OË) + (w—W)' F(w) > -€, Vw € Dia), (3.16) 


where 

Daw) = {w E€ Q|||w — wl < 1}. 
Thus, to establish the worst-case O(1/t) convergence rate for the balanced ALM (2.2), we need 
to show that, for given € > 0, after t iterations, we can find w € Q, such that 


weQ, and ap {0(&) — O(a) + (ù — w)” F(w)} <e=O(1/t). (3.17) 
WED (w) 


We present this result in the following theorem. 


Theorem 3.5. Let {w* = (x*,\*)} be the sequence generated by the balanced ALM (2.2) and 
H be defined in (3.7. For any integer number t > 0, if we define 


t 


és 1 So wit 
— Ll 
k=0 
then we have 
z E z 1 


Proof. First, it follows from (3.11) that, for all k > 0, we have 
1 1 
he 0, (a2)—O (at) +(w— wih)" F(w) + 5 ww" liz 25 |= wett!2,, Yw € Q. (3.20) 


Summarizing the inequalities (3.20) over k = 0,1,...,t, we obtain 


t 
1 
(t+1)0 -5e k+1) ) + (e+ Iw - So wht) Fe )+ sw — wll, 20, Ywen. 
k=0 


It follows from (3.18) that 


1 
TE Fe k+1) _ a(x) + (Ŭŭ — w)T F (w) < lw- w°||2,, Vw eo. (3.21) 


= 941 


Note that w; defined in (3.18) is a convex combination of all iterates w* for k = 0,--- ,#, and 


6(x) is convex. We thus have 
t 














1 
Xt — i ritt, 
ergy 
and also i 
I 
O(t) < — O(a kt), 
E) < prg ee) 
Substituting it into ( , the assertion (3.19) of this theorem follows directly. 
Then, because AEn 5T), the inequality TD indicates that a defined in (3.18), which is the 


average of the first t eae generated by the balanced ALM ( , is an approximate solution 
of the VI (3.4) with an accuracy of O(1/t). Hence, the Nee case O(1/t) convergence rate 
measured by the iteration complexity is established for the balanced ALM (2.2) in the ergodic 
sense. 


4 Splitting versions of the balanced ALM (2.2) for separable 
convex programming 


the generic model (1.1) when the objective function of such a model can be represented as the 
sum of multiple functions without coupled variables. For these separable models, the classic 
ALM has been adapted into various splitting versions by decomposing the primeval x- 
subproblem into smaller ones. These splitting versions take advantage of the separable 
structure in the model more effectively; the decomposed subproblems are usually easier in the 
sense that each of them only needs to tackle one function component. For various applications 
including the mentioned sparsity- and low-rank-promoted ones in data science domains, splitting 
versions of the ALM (1.2) may generate subproblems that are easy enough to have closed-form 
solutions. Among various splitting versions of the classic ALM (1.2), the most popular one is 
probably the mentioned ADMM ak which suggests splitting the x-subproblem into 
two sequentially when the model (1.1) has a two-block separable dae 

In this section, in parallel with the successful legacy of the classic ALM (1.2 7 By splitting 
versions, we also discuss how to design splitting versions for the balanced Ei 2.2) when the 
model ( 2} is separable. For succinctness and without ambiguity, we reuse some EE nes and 
notation as those in Sections [2] and 


The classic ALM (1.2) plays an extremely influential role in solving various separable cases of 
(11 
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4.1 Model 


Let us consider the separable convex programming model with both linear equality and inequality 
constraints 


p P 
min{ > 6;(x:) | YO Aiwi =b (or > b), z€ x}, (4.1) 
i=1 i=1 


where 6; : R™ > R, i =1,...,p, are closed proper convex functions and they are not necessarily 
smooth; X; C R, i = 1,...,p, are closed convex = A; E KR", i = 1,...,p, are given 
matrices; and b € R” is a given vector. The model (4.1) can be regarded as an extension of the 
model (2.1) from p = 1 to r > s Let us only ae E the multiple-block separable case with 
p22 EH Similarly as (3.2), we reuse the letters and define 


£ R”, dy Aizti = b, 
Q= II Xx A where A= Dat (4.2) 
m if Sy Aixi > b. 


4.2 Algorithm 


Now, we o the balanced ALM ( (2.2) ) to the i ae separable convex programming 
model (4.1) and present a splitting version of (2.2) below. 





Algorithm: A splitting version of the balanced ALM (2.2) for (4.1) 
Let r; > 0 for i = 1,2,--- ,p, and 6 > 0 be arbitrary constants. Define 


p 
1 
Hp = X AAF + ôIm, (4.3) 


Ii t 


1 
qË =a? + > ATA", fori = 1,2,- ,p; and s” -yae e zE) —b. 


i 








Then, with w* = (x*, x8,- Ea AF), the new iterate w*t! = (g}t1 ght! ... a AFTI) is 
generated via the following steps: 
1 . Ti k2 . 
€ arg min{ 0;(x;) + z lei — qil? | xi € Xi}, i = 1,2,- p; (4.4a) 
1 
d*+1 = arg min {50 =A H (A — AF) + (TAJA € a} . (4.4b) 





Remark 4.1. The subproblems in are of the same structure as those in (2.2). For the xi- 
subproblem (4.44), the function 0;(a;) and the coefficient A; are decoupled without any explicit or 
implicit condition related to A;, and thus it is also reduced to estimating the proximity operator 
of 0i(xi) when Xi = R™. In addition, the A-subproblem is a positive definite system of 
linear equations or a standard quadratic programming with non-negative sign constraints. Note 
that all r;’s have no other restriction than the sign requirement. Hence, the algorithm 

keeps all features of the balanced ALM (2.2). 


Remark 4.2. For generality, we consider different r; for different x;-subproblems. They can be 
identical for simplicity. Similarly as the balanced ALM (2.4), to implement the algorithm : 
empirically we can fix 6 as a small positive constant throughout. 
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4.3 Convergence analysis 


In this subsection, we follow the analysis in Section [3] and prove the convergence of the splitting 
version of the balanced ALM (4.4). 


4.3.1 Variational inequality characterization of (4.1) 


For convergence analysis purpose, we also need the VI characterization for the optimality con- 
dition of the model (4.1). Let A € R™ be the Lagrange multiplier of (4.1) and the Lagrangian 
function of the problem (4.1) be defined as 


P P 
L(@1,..-,2p,) = X- 6;(ai) — aT( Aae b). (4.5) 
1 


i=1 i= 


Similarly as Section|3.1] we reuse the letters and know that finding a saddle point of L(x1,..., £p, A) 
can be written as the following VI: 


we EQ, (x) — O(a*) + (w—w*) F(w*) > 0, VweQ, (4.6a) 
where 
LY -ATA 
. Tı p A 
tsip , O(a) = 2 tile), F(w) = LATA ,  (4.6b) 
À Tp = P Ax; —b 
j=l iTi — 


and Q is defined in (4.2). Again, we denote by Q* the solution set of the VI (4.6). 


4.3.2 Convergence 


Let us recall the proofs in Section [3] for the convergence of the balanced ALM (2.2). It is easy 
to see that the crucial step is to identify the difference between an iterate and a solution point 
by the inequality in Theorem in which the matrix H should be positive definite as 
proved in Proposition so that the difference can be measured by distances defined by the 
H-norm. After Proposition and Theorem [3.1] are proved, the remaining part of the proof 
for the convergence as well as the worst-case convergence rate is subroutine. Hence, to prove 
the convergence of the splitting version of the balanced ALM (4.4), we only need to prove an 
inequality similar as in which the accompanying matrix is also positive definite. 


Proposition 4.1. Let ri > 0 fori =1,2,---,p, and 6 > 0 be arbitrary constants. The matrix 
defined as 
riln, 0 ve 0 AT 
0 
H= 0 (4.7) 
O ooe 0 ron, Ar 
p 
1 
Ay ot ae» Ay > AA + dm 
i=1 ’ 


is positive definite. 
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Proof. Note that 


i=1 m 
where 
riln; A; VTiLn; 
1 
H; = = (= oy Lar). 
i 
1 T 
; — A; A! 1 
A; ce ae 
ri 
For any w = (21,...,2%p, A) # 0, we have 


p 
I 2 
T J T 2 
Hw = | jv; +4/—A; al + d|lA 0. 
w Hw 3 a/TiT = IAI > 


Hence, the matrix H is positive definite. 














Theorem 4.1. Let {w* = (zẸ,- „ak, AF)} be the sequence generated by the balanced ALM 


and H be defined in (4.7). Then, we have 
wet EQ, 6(2)—0(0*t!)4 (w—w*t 1)? F(wEt) > (w—w* 1) A (wP—w 1), Vw eo. (4.8) 
Proof. According to Lemma [3.1] fori =1,2,...,p, we have 
FTI E Xi, Oili) — O:(#*) + (a — wt 1)T {ATM + ri(af Tt — f)} > 0, Vay E&i. 
Then, for any unknown \**!, we have 


+1 c a 0;(x;) _ 6;(a*t1) Ea (zi — E. 
> (aj — x} {ri (af — oft") + ATOR A, Vec Xi (4.9) 


Also because of Lemma|3.1| A¥+1 generated by (4.4b) is characterized by the VI 
p 
1 
k+1 k+l) rët gh E4 AT kH kh > 
AFHL EA, (AA {È Ail a? -AA +5Im)(A A )} >0, VAEA. 


It can be rewritten as 


Mtl EA, — \k+1 F(E Aat! — b) 


i= 


p P 
> AATAS Aat =a) + (So haa? +8) OEA} VA e aa 
i=1 j=1 " 


BR 














Combining (4.9) and (4.10), and using the notation in (4.6), we prove the assertion (4.8). 


As mentioned, based on Proposition [4.1] and Theorem [4.1] similar conclusions as Theorems 
can be trivially proved. Thus, convergence results similar as those in Section |3| can be 
obtained for the splitting version of the balanced ALM (4.4); we omit the details for succinctness. 
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5 An alternative strategy for balancing 


The balanced ALM (2.2) can be generalized to the splitting version if the model under 
discussion is changed from the one-block case to the multiple-block case (4-1). There 
are other ways for the generalization, in addition to the technique introduced in Section |4| In 
(4.4), we see that each of the 2;-subproblems does not involve any quadratic term in form of 
4 || Aes — qE ||? so that it can be reduced to estimating the proximity operator of 0;(x;) when 
Xi = R". In this sense, all such x;-subproblems are preferred when it is easy to estimate the 
proximity operator of 6;(x;). On the other hand, all A;’s are aggregated in the A-subproblem 
because of the matrix H, defined in (4.3). For some cases where some or all || A? Aj|| are 
large (or, some or all A;’s are ill-conditioned), it is preferred to consider alleviating the quadratic 
programming problem by removing such AT A; from H,. Hence, from methodological 
point of view, it is also interesting to ask if we can keep terms in form of || Aja; — gf||? for 
some 2x;-subproblems (4.4ah, and meanwhile remove the corresponding A;A? from the matrix 
Hy in so that the A-subproblem ee easier. Accordingly, we propose to revise 
the splitting version of the balanced ALM (4.4) such that some «;-subproblems are in form of 


A Ti 
ie € arg min { 6; (2) + z l4izi = er | aj, € Xi}, 
with q} a certain constant vector, and the corresponding A; AT is excluded in the A-subproblem 
(4.4b). This idea provides an alternative strategy for balancing the generated subproblems, and 
it enables a user to determine how to balance the difficulty of subproblems in accordance with 
the specific functions 6,’s, coefficient matrices A,;’s, and sets %;’s for a given application. 


5.1 Model 


For succinctness of notation, let us just take the special case of (4.1) with p = 2 and only linear 
equality constraints to illustrate our idea: 


min{ (21) + 02(x2) | Azı + Agr = b, xı E€ X1, £2 E Xa}. (5.1) 


Again, without ambiguity, some letters and notation are reused. 


5.2 Algorithm 


An alternative splitting version of the balanced ALM (2.2) for the specific model (5.1) can be 
presented as below. 





Algorithm: An alternative splitting version of the balanced ALM for (5.1) 
Let r > 0, s > 0 and 6 > 0 be arbitrary constants. Define 


1 1 
Hə = 74243 + (7 + 8)Im, (5.2) 


1 
gk := ok + 742A" and s$ = Ay(a¥*! — oF) + Aaah — r$) — b. 


Then, with w* = (a*, ak, AF), the new iterate wt! = (aft Etl, A¥+1) is generated via the 
following steps: 











. r ô 
oS arg min{ 01 (x1) — zi ATM + z l4 (21 — xf) |? + z l2 — at |? |i € Ai}, (5.3a) 
; s 
art = arg min{ (x2) + zl22 = all? | T2 €E Xa}, ve 
1 
ae min{ 5(. — \*) H(A — Az) + (sK)TA| AE Ah. (5.3c) 
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Remark 5.1. In the algorithm (5.3), we see that only the x2-subproblem can be reduced 
to estimating the proximity operator of 02 if Xa = R", while the x1-subproblem is not 
proximity-induced because the term ||Ai(a1 — x*)||? is kept. As a balance, the matrix Hə de- 
fined in which determines the quadratic programming problem does not involve Ay. 
In this sense, the x;-subproblems and the A-subproblem are balanced in another way. For the 
generic model with p > 2, the splitter version of the balanced ALM can be revised in 
the sense that some of its x;-subproblems are flexibly chosen to keep the terms ||A;(a; — x*)||? 
whilst the quadratic term determining the \*-subproblem does not involve the corresponding A;’s. 
Thus, different algorithms with different balanced subproblems can be designed analogously. The 


algorithm is just the simplest illustration with p = 2 for this philosophy. 


5.3 Convergence results 


As mentioned, to prove the convergence of the algorithm (5.3), we just need to prove an inequality 
similar as (3.8) in Theorem [3.1] and show that the accompanying matrix is positive definite. 


Proposition 5.1. Let r > 0, s >0, and ô > 0 be arbitrary constants. Then, the matrix defined 


as 
rAFA,+6In, 0 at 
H= 0 sIng i An (5.4) 
Aj Ag = AaAa + (= +6)Im 
is positive definite. 
Proof. Note that 
rAtA,+6In, 0 Af 0 0 0 00 0 
T 
A 0 -Im 0 A 74243 0 0 Im 


For any w = (x,y, à) Æ 0, we have 


wT Hw = (||vrAre + 2A i + lle?) + |] vay + yanl + d||Al|2 > 0. 


Thus, the matrix H is positive definite. 

















Theorem 5.1. Let {w* = (af, xk, \*)} be the sequence generated by the algorithm and H 
be defined in (5.4). Then, we have 


wet EQ, 6(u)—0(ukt) + (w—wkt)P F(wet!) > (w—wt) PA (wk wt), Ywen. (5.5) 
Proof. According to Lemma|3.1| x¥+1 generated by (5.3a) is characterized by the VI 
ian ace 0; (21) —01 (sft) + (a1 —af**)? {APA + (r AT Ay +6) (aft —at)} >0, Vai € A. 


EtL 


Then, for any unknown , we have 


ait? € Xi, O1(a1) — O2(att*) + (a1 — af)" (-ATA"t!) 
> (x1 - ay yyd (PATA, + d)(a® — rkt) + ATOR = AFHI, Vaz € X. (5.6) 


Analogously, it follows from Lemma that art generated by (5.3b) can be characterized by 
the VI 


ahtt E€ Xa, 02(x2) — 62(ak*") + (£2 — ast) {AP + s(aktt = x5)} >0, Vx Ee AX. 
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Then, for any unknown \**+!, we have 
+1 E Ao ba (x2) Z O2(akt1) 4 (£2 _ htt yt (x Aaa) 
> (x2 = x ) {s(x 2 ght) o ae j} Vr € %2. (5.7) 


Similarly, according to Lemmaf3.1| T generated by (5 is characterized by the VI: Finding 
AFHI € A such that 


(QAP! (A pa k+l gk] 4 A[aak+1_ h]—b) + ( AAR (~ +6)I m) (atta) | >0, VAEA. 
It can be rewritten as 
AFHL A, (A=Mt)T(Ayattt + Assh — b) 
> (Aa LA (of — af) + do(ok — of + ($4247 + +5) Im) (AE) 
for all A € A. Combining (5.6), and (5.8), and using the notation in (3.4), we get the 


following assertion. 

















5.4 Comparison with linearized versions of the ADMM 


It is interesting to compare the proposed algorithm (5.3) with the well-known linearized versions 
of the ADMM. For the model (5.1), the original ADMM scheme reads as 


He arg min{ 6; (z1) — x1 TATAE + $||Arzi + Agxk — b||? | £1 E€ XY}, (5.9a) 
Hi € arg min{ (x2) — x3 T ATAF + ZA + Azz — b||? | £2 E€ X2},  (5.9b) 
AFHI = AF mn Ape + Aost! — b), (5.9c) 


in which r > 0 is the penalty parameter and A € R™ is the Lagrange multiplier. The first 
proximal version of the ADMM (PADMM) which suggests regularizing both the zı- and zə- 
subproblems in (5.9) with generic proximal terms was proposed in (see also {10} [10] O| for a special 


case). For simplicity, let us assume that the 2;-subproblem (5.9a)) is easy but the x2-subproblem 
5.9b) is difficult. Then, the PADMM in can be written as 








gee arg min{ 6) (£1) — xy TA AEG 5l|Arı + Agak — b||? | tı E€ AX}, (5.10a) 
ger E arg min{02(£2) — £5 PALM + zA | Azz — b||? 4 Elza sklz | T2 € Xz}, (5.10b) 
AFTI = AF — r(Ayatt! + Aarh t — b), (5.10c) 


in which G € #"2*"2 is a positive definite matrix. Because of the same reason as mentioned for 
(1.5), it is interesting to consider “linearizing” the quadratic term “5 P| Arit! + Agx2 — b||?” and 
thus alleviating the subproblem as estimating the pratt operator of 62(a%2) when 
Xə = R. Similar as (1.6), this can be done by choosing G := sIn, — rAd Ao in T As 
well discussed in the literature, e.g., (11][21}[26}[844[85], for various applications arising in image 
processing, statistical learning, and others, the condition 


s > r|| AZ Aol (5.11) 


is required to ensure the positive definiteness of G and thus the convergence of (5.10). Recently, 
the condition au is further optimally improved in as s > 0.75 - r|| A Ag||. Similarly 
as (1.5a) and ( oe 62(x2) and Ay are decoupled in notation if G := sIn, — az Apo 
in (5.10b), the Me rout is correlated implicitly with A g via the condition or 
its improved one in p1. Hence, efficiency of all existing linearized versions of the A is 
severely affected if || AF Aə|| is large. In this sense, the algorithm (5.3) improves existing linearized 
versions of the ADMM in the sense that the x2-subproblem (5.3b) can be reduced to estimating 
the proximity operator of 02 if X2 = R”, while it is not affected by ||AJ Ag|| and thus possible 
tiny step sizes could be avoided even if || AZ Ag|| is large. 
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6 More generalized versions 


In the preceding sections, the balanced ALM is proposed for the generic model (2.1), and 
then its splitting versions and are studied for the separable models and (5.1), 
respectively. As mentioned, it was shown in that the classic ALM (1.2) is an application of 
the PPA proposed in . In view of the generalized version of the PPA studied in [16], all the 
proposed algorithms (2.2), and can be further generalized. For instance, the balanced 
ALM (2.2) can be generalized as 


7 
aR = arg min{b (2) + zll — aol” |ce æ}, (6.1a) 


i 1 

ÀF = arg min{ 5(A — N*) Ho(A — A") + (8§)7 A | A EA}, (6.1b) 
ght ak gk — xk 

( k+ ) = ( yk ) -a( NES ) with a € (0,2), (6.1c) 


where 5% = A(2z* — x*) — b. Clearly, the scheme includes the balanced ALM as 
a special case with a = 1. Numerically, the extra step has been shown to be able to 
accelerate the convergence of the classic PPA for various problems. We refer to, e.g., [81/22124], 
for some empirical studies. Hence, it is motivated to consider the generalized scheme (6.1) to 
replace the balanced ALM (2.2). 

To establish the convergence of (6.1), we just need to follow the roadmap in Section 
and prove some similar theorems. For instance, the inequality in Theorem can be 
generalized as 





a:(0(x) — 0(&*) + (w — wo)" F(w)) 


je ee 


5 Jw" — w IZ, Vw €Q. 


1 k k 
> zllw- wti — lw- w" llir 


Moreover, the inequality (3.14) in Theorem |3.3|can be generalized as 





[wt — w” llr < lw" — w" |e — (2 — @)||w* — "lin Vow E OF. 


Then, based on these inequalities, analogous as the analysis in Section [3] convergence results for 
the generalized version of the balanced ALM can be obtained trivially. 

In addition, the extra step can be combined with the splitting versions of the balanced 
ALM (4.4) and (5.3) as well, and thus some generalized versions of the algorithms and 
can also be proposed. The details are omitted for succinctness. 


7 Conclusions 


In this paper, we reshape the classic augmented Lagrangian method (ALM) by balancing its sub- 
problems. Convex programming problems with both linear equality and inequality constraints 
are considered. We propose a balanced ALM for the generic case, and various splitting ver- 
sions for the separable cases. The balanced ALM and its splitting versions have the common 
feature that the subproblems are better balanced, and they are easier to be implemented for 
various applications. The balanced ALM advances the classic ALM by enlarging its applicable 
range, better balancing its subproblems, and improving its implementation. The balanced ALM 
and its splitting versions substantially enhance the rich literature of the classic ALM and its 
variants from a novel perspective, and open up the door to designing other application-tailored 
algorithms of the same kind for more specific/complicated problems. 
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